Preconditioned Temporal Difference Learning

نویسنده

  • Yao HengShuai
چکیده

LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new method. It includes LSPE, and several new data efficient algorithms. The data efficiency of these algorithms is validated by learning an absorbing Markov chain. Also, the asymptotic properties of the new algorithms are analyzed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Control of Multivariable Systems Based on Emotional Temporal Difference Learning Controller

One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...

متن کامل

Solving large systems arising from fractional models by preconditioned methods

This study develops and analyzes preconditioned Krylov subspace methods to solve linear systems arising from discretization of the time-independent space-fractional models. First, we apply shifted Grunwald formulas to obtain a stable finite difference approximation to fractional advection-diffusion equations. Then, we employee two preconditioned iterative methods, namely, the preconditioned gen...

متن کامل

Preconditioned IterativeMethods for Two-Dimensional Space-Fractional Diffusion Equations

In this paper, preconditioned iterative methods for solving two-dimensional space-fractional diffusion equations are considered. The fractional diffusion equation is discretized by a second-order finite difference scheme, namely, the Crank-Nicolson weighted and shifted Grünwald difference (CN-WSGD) scheme proposed in [W. Tian, H. Zhou andW. Deng, A class of second order difference approximation...

متن کامل

On the Behavior of Combination High-Order Compact Approximations with Preconditioned Methods in the Diffusion-Convection Equation

In this paper, a family of high-order compact finite difference methods in combination preconditioned methods are used for solution of the Diffusion-Convection equation. We developed numerical methods by replacing the time and space derivatives by compact finitedifference approximations. The system of resulting nonlinear finite difference equations are solved by preconditioned Krylov subspace m...

متن کامل

Application of temporal difference learning to the game of Snake

APPLICATION OF TEMPORAL DIFFERENCE LEARNING TO THE GAME OF SNAKE Christopher Lockhart

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007